Purpose: The purpose of this project is to analyze a number of trends in the Apple App Store. I chose to do this project using this data because I’m interested in mobile app development and currently have an application on the App Store. Specifically, my application is for music and social media, so I will be heavily analyzing these genres specifically.

# clean up workspace environment
rm(list = ls())
#Packages
library(mosaic)
Loading required package: dplyr
Registered S3 method overwritten by 'dplyr':
  method           from
  print.rowwise_df     

Attaching package: 㤼㸱dplyr㤼㸲

The following objects are masked from 㤼㸱package:stats㤼㸲:

    filter, lag

The following objects are masked from 㤼㸱package:base㤼㸲:

    intersect, setdiff, setequal, union

Loading required package: lattice
Loading required package: ggformula
Loading required package: ggplot2
Loading required package: ggstance

Attaching package: 㤼㸱ggstance㤼㸲

The following objects are masked from 㤼㸱package:ggplot2㤼㸲:

    geom_errorbarh, GeomErrorbarh


New to ggformula?  Try the tutorials: 
    learnr::run_tutorial("introduction", package = "ggformula")
    learnr::run_tutorial("refining", package = "ggformula")
Loading required package: mosaicData
Loading required package: Matrix
Registered S3 methods overwritten by 'htmltools':
  method               from         
  print.html           tools:rstudio
  print.shiny.tag      tools:rstudio
  print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'htmlwidgets':
  method           from         
  print.htmlwidget tools:rstudio
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.

Attaching package: 㤼㸱mosaic㤼㸲

The following object is masked from 㤼㸱package:Matrix㤼㸲:

    mean

The following object is masked from 㤼㸱package:ggplot2㤼㸲:

    stat

The following objects are masked from 㤼㸱package:dplyr㤼㸲:

    count, do, tally

The following objects are masked from 㤼㸱package:stats㤼㸲:

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test, quantile,
    sd, t.test, var

The following objects are masked from 㤼㸱package:base㤼㸲:

    max, mean, min, prod, range, sample, sum
library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v tibble  2.1.3     v purrr   0.3.2
v tidyr   1.0.0     v stringr 1.4.0
v readr   1.3.1     v forcats 0.4.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x mosaic::count()            masks dplyr::count()
x purrr::cross()             masks mosaic::cross()
x mosaic::do()               masks dplyr::do()
x tidyr::expand()            masks Matrix::expand()
x dplyr::filter()            masks stats::filter()
x ggstance::geom_errorbarh() masks ggplot2::geom_errorbarh()
x dplyr::lag()               masks stats::lag()
x tidyr::pack()              masks Matrix::pack()
x mosaic::stat()             masks ggplot2::stat()
x mosaic::tally()            masks dplyr::tally()
x tidyr::unpack()            masks Matrix::unpack()
library(DataComputing)
library(party)
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo

Attaching package: 㤼㸱zoo㤼㸲

The following objects are masked from 㤼㸱package:base㤼㸲:

    as.Date, as.Date.numeric

Loading required package: sandwich

Attaching package: 㤼㸱strucchange㤼㸲

The following object is masked from 㤼㸱package:stringr㤼㸲:

    boundary

The chunk below loads all data from the two datasets regarding the Apple App Store

#This dataset contains a number of analytics about over 7000 applications on the App Store.
#Information includes rating, price, genre, etc.

dataset_1 <- 'C:/Users/angel/Dropbox/Penn State/STAT 184/Project/App_Store_Analysis/AppleStore.csv'

App_Store_Data <- read.csv(file = dataset_1, header=TRUE, sep=",")

#This dataset contains a number of analytics about over 7000 applications on the App Store.
#Information includes rating, price, genre, etc.
dataset_2 <- 'C:/Users/angel/Dropbox/Penn State/STAT 184/Project/App_Store_Analysis/appleStore_description.csv'

App_Store_Description_Data <- read.csv(file = dataset_2, header=TRUE, sep=",")

Sample of the App Store Dataset

App_Store_Data %>%
  sample_n(size = 10)
NA

Sample of the Description Dataset

App_Store_Description_Data %>%
  sample_n(size = 10)

General Analysis

The first thing that I wanted to do with this dataset is gather some general statistics about the apps within the App Store.

#This code displays the mean size of each app, app rating, and number of languages supported within an app.
App_Store_Data %>%
  summarise(num_apps = n(),
            mean_app_size = mean(size_bytes),
            mean_app_rating = mean(user_rating),
            mean_supported_languages = mean(lang.num))

App Store Breakdown

The following is a general breakdown of the categories of apps in the App Store. Below is a chart containing the specific number of apps in each category, in addition to a plot showing these numbers. This plot shows a more visually appealing way of looking at just how diverse the App Store really is. Based on this data, it’s very clear that games take up a very large population in the App Store.

App_Categories <- App_Store_Data %>%
  group_by(prime_genre) %>%

App_Categories
ggplot(data=App_Categories,aes(x=prime_genre,y=num_on_store ,fill=prime_genre))+geom_bar(stat='identity',position='stack', width=.9)+theme(axis.text.x=element_text(angle = 90, vjust = 0.5))+ xlab('App Category') + ylab('Number on Store')


#layer1 <- geom_point(data = App_Store_Data, aes(shape = user_rating))
#layer2 <- geom_point(data = App_Store_Data, aes(shape = user_rating_ver))

#App_Store_Data %>%
 # ggplot(aes(x = id, y = user_rating))+geom_point()

Rating vs Category

Specific classification for music and social media

#music data
Music_App_Data <-
  App_Store_Data %>%
  filter(prime_genre == 'Music')

#social network data
Social_Network_App_Data <-
  App_Store_Data %>%
  filter(prime_genre == 'Social Networking')


#get highest rated music app names


#get highest rated social media app names


#compare ratings of free vs paid for each. 


#Use ML to take the user_rating_ver (rating of current app version) and predict the user_rating (overall app version) for each category
LS0tDQp0aXRsZTogIkFwcF9TdG9yZV9BbmFseXNpcyINCmF1dGhvcjogQW5nZWxvIEt3YWsNCmRhdGU6IDEyLzE2LzIwMTkNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNClB1cnBvc2U6DQpUaGUgcHVycG9zZSBvZiB0aGlzIHByb2plY3QgaXMgdG8gYW5hbHl6ZSBhIG51bWJlciBvZiB0cmVuZHMgaW4gdGhlIEFwcGxlIEFwcCBTdG9yZS4gSSBjaG9zZSB0byBkbyB0aGlzIHByb2plY3QgdXNpbmcgdGhpcyBkYXRhIGJlY2F1c2UgSSdtIGludGVyZXN0ZWQgaW4gbW9iaWxlIGFwcCBkZXZlbG9wbWVudCBhbmQgY3VycmVudGx5IGhhdmUgYW4gYXBwbGljYXRpb24gb24gdGhlIEFwcCBTdG9yZS4gU3BlY2lmaWNhbGx5LCBteSBhcHBsaWNhdGlvbiBpcyBmb3IgbXVzaWMgYW5kIHNvY2lhbCBtZWRpYSwgc28gSSB3aWxsIGJlIGhlYXZpbHkgYW5hbHl6aW5nIHRoZXNlIGdlbnJlcyBzcGVjaWZpY2FsbHkuIA0KDQpgYGB7cn0NCiMgY2xlYW4gdXAgd29ya3NwYWNlIGVudmlyb25tZW50DQpybShsaXN0ID0gbHMoKSkNCiNQYWNrYWdlcw0KbGlicmFyeShtb3NhaWMpDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoRGF0YUNvbXB1dGluZykNCmxpYnJhcnkocGFydHkpDQpgYGANCg0KVGhlIGNodW5rIGJlbG93IGxvYWRzIGFsbCBkYXRhIGZyb20gdGhlIHR3byBkYXRhc2V0cyByZWdhcmRpbmcgdGhlIEFwcGxlIEFwcCBTdG9yZQ0KYGBge3J9DQojVGhpcyBkYXRhc2V0IGNvbnRhaW5zIGEgbnVtYmVyIG9mIGFuYWx5dGljcyBhYm91dCBvdmVyIDcwMDAgYXBwbGljYXRpb25zIG9uIHRoZSBBcHAgU3RvcmUuDQojSW5mb3JtYXRpb24gaW5jbHVkZXMgcmF0aW5nLCBwcmljZSwgZ2VucmUsIGV0Yy4NCg0KZGF0YXNldF8xIDwtICdDOi9Vc2Vycy9hbmdlbC9Ecm9wYm94L1Blbm4gU3RhdGUvU1RBVCAxODQvUHJvamVjdC9BcHBfU3RvcmVfQW5hbHlzaXMvQXBwbGVTdG9yZS5jc3YnDQoNCkFwcF9TdG9yZV9EYXRhIDwtIHJlYWQuY3N2KGZpbGUgPSBkYXRhc2V0XzEsIGhlYWRlcj1UUlVFLCBzZXA9IiwiKQ0KDQojVGhpcyBkYXRhc2V0IGNvbnRhaW5zIGEgbnVtYmVyIG9mIGFuYWx5dGljcyBhYm91dCBvdmVyIDcwMDAgYXBwbGljYXRpb25zIG9uIHRoZSBBcHAgU3RvcmUuDQojSW5mb3JtYXRpb24gaW5jbHVkZXMgcmF0aW5nLCBwcmljZSwgZ2VucmUsIGV0Yy4NCmRhdGFzZXRfMiA8LSAnQzovVXNlcnMvYW5nZWwvRHJvcGJveC9QZW5uIFN0YXRlL1NUQVQgMTg0L1Byb2plY3QvQXBwX1N0b3JlX0FuYWx5c2lzL2FwcGxlU3RvcmVfZGVzY3JpcHRpb24uY3N2Jw0KDQpBcHBfU3RvcmVfRGVzY3JpcHRpb25fRGF0YSA8LSByZWFkLmNzdihmaWxlID0gZGF0YXNldF8yLCBoZWFkZXI9VFJVRSwgc2VwPSIsIikNCmBgYA0KDQojIyBTYW1wbGUgb2YgdGhlIEFwcCBTdG9yZSBEYXRhc2V0DQpgYGB7cn0NCkFwcF9TdG9yZV9EYXRhICU+JQ0KICBzYW1wbGVfbihzaXplID0gMTApDQpgYGANCiMjIFNhbXBsZSBvZiB0aGUgRGVzY3JpcHRpb24gRGF0YXNldA0KYGBge3J9DQpBcHBfU3RvcmVfRGVzY3JpcHRpb25fRGF0YSAlPiUNCiAgc2FtcGxlX24oc2l6ZSA9IDEwKQ0KYGBgDQojIEdlbmVyYWwgQW5hbHlzaXMNClRoZSBmaXJzdCB0aGluZyB0aGF0IEkgd2FudGVkIHRvIGRvIHdpdGggdGhpcyBkYXRhc2V0IGlzIGdhdGhlciBzb21lIGdlbmVyYWwgc3RhdGlzdGljcyBhYm91dCB0aGUgYXBwcyB3aXRoaW4gdGhlIEFwcCBTdG9yZS4NCmBgYHtyfQ0KI1RoaXMgY29kZSBkaXNwbGF5cyB0aGUgbWVhbiBzaXplIG9mIGVhY2ggYXBwLCBhcHAgcmF0aW5nLCBhbmQgbnVtYmVyIG9mIGxhbmd1YWdlcyBzdXBwb3J0ZWQgd2l0aGluIGFuIGFwcC4NCkFwcF9TdG9yZV9EYXRhICU+JQ0KICBzdW1tYXJpc2UobnVtX2FwcHMgPSBuKCksDQogICAgICAgICAgICBtZWFuX2FwcF9zaXplID0gbWVhbihzaXplX2J5dGVzKSwNCiAgICAgICAgICAgIG1lYW5fYXBwX3JhdGluZyA9IG1lYW4odXNlcl9yYXRpbmcpLA0KICAgICAgICAgICAgbWVhbl9zdXBwb3J0ZWRfbGFuZ3VhZ2VzID0gbWVhbihsYW5nLm51bSkpDQpgYGANCg0KIyMgQXBwIFN0b3JlIEJyZWFrZG93bg0KVGhlIGZvbGxvd2luZyBpcyBhIGdlbmVyYWwgYnJlYWtkb3duIG9mIHRoZSBjYXRlZ29yaWVzIG9mIGFwcHMgaW4gdGhlIEFwcCBTdG9yZS4gQmVsb3cgaXMgYSBjaGFydCBjb250YWluaW5nIHRoZSBzcGVjaWZpYyBudW1iZXIgb2YgYXBwcyBpbiBlYWNoIGNhdGVnb3J5LCBpbiBhZGRpdGlvbiB0byBhIHBsb3Qgc2hvd2luZyB0aGVzZSBudW1iZXJzLiBUaGlzIHBsb3Qgc2hvd3MgYSBtb3JlIHZpc3VhbGx5IGFwcGVhbGluZyB3YXkgb2YgbG9va2luZyBhdCBqdXN0IGhvdyBkaXZlcnNlIHRoZSBBcHAgU3RvcmUgcmVhbGx5IGlzLiBCYXNlZCBvbiB0aGlzIGRhdGEsIGl0J3MgdmVyeSBjbGVhciB0aGF0IGdhbWVzIHRha2UgdXAgYSB2ZXJ5IGxhcmdlIHBvcHVsYXRpb24gaW4gdGhlIEFwcCBTdG9yZS4gDQpgYGB7cn0NCkFwcF9DYXRlZ29yaWVzIDwtIEFwcF9TdG9yZV9EYXRhICU+JQ0KICBncm91cF9ieShwcmltZV9nZW5yZSkgJT4lDQogIHN1bW1hcmlzZShudW1fb25fc3RvcmUgPSBuKCkpDQoNCkFwcF9DYXRlZ29yaWVzDQpgYGANCg0KYGBge3J9DQpnZ3Bsb3QoZGF0YT1BcHBfQ2F0ZWdvcmllcyxhZXMoeD1wcmltZV9nZW5yZSx5PW51bV9vbl9zdG9yZSAsZmlsbD1wcmltZV9nZW5yZSkpK2dlb21fYmFyKHN0YXQ9J2lkZW50aXR5Jyxwb3NpdGlvbj0nc3RhY2snLCB3aWR0aD0uOSkrdGhlbWUoYXhpcy50ZXh0Lng9ZWxlbWVudF90ZXh0KGFuZ2xlID0gOTAsIHZqdXN0ID0gMC41KSkrIHhsYWIoJ0FwcCBDYXRlZ29yeScpICsgeWxhYignTnVtYmVyIG9uIFN0b3JlJykNCmBgYA0KDQoNCmBgYHtyfQ0KDQojbGF5ZXIxIDwtIGdlb21fcG9pbnQoZGF0YSA9IEFwcF9TdG9yZV9EYXRhLCBhZXMoc2hhcGUgPSB1c2VyX3JhdGluZykpDQojbGF5ZXIyIDwtIGdlb21fcG9pbnQoZGF0YSA9IEFwcF9TdG9yZV9EYXRhLCBhZXMoc2hhcGUgPSB1c2VyX3JhdGluZ192ZXIpKQ0KDQojQXBwX1N0b3JlX0RhdGEgJT4lDQogIyBnZ3Bsb3QoYWVzKHggPSBpZCwgeSA9IHVzZXJfcmF0aW5nKSkrZ2VvbV9wb2ludCgpDQoNCmBgYA0KIyMgUmF0aW5nIHZzIENhdGVnb3J5DQpgYGB7cn0NCg0KYGBgDQoNCg0KDQpTcGVjaWZpYyBjbGFzc2lmaWNhdGlvbiBmb3IgbXVzaWMgYW5kIHNvY2lhbCBtZWRpYQ0KYGBge3J9DQojbXVzaWMgZGF0YQ0KTXVzaWNfQXBwX0RhdGEgPC0NCiAgQXBwX1N0b3JlX0RhdGEgJT4lDQogIGZpbHRlcihwcmltZV9nZW5yZSA9PSAnTXVzaWMnKQ0KDQojc29jaWFsIG5ldHdvcmsgZGF0YQ0KU29jaWFsX05ldHdvcmtfQXBwX0RhdGEgPC0NCiAgQXBwX1N0b3JlX0RhdGEgJT4lDQogIGZpbHRlcihwcmltZV9nZW5yZSA9PSAnU29jaWFsIE5ldHdvcmtpbmcnKQ0KDQoNCiNnZXQgaGlnaGVzdCByYXRlZCBtdXNpYyBhcHAgbmFtZXMNCg0KDQojZ2V0IGhpZ2hlc3QgcmF0ZWQgc29jaWFsIG1lZGlhIGFwcCBuYW1lcw0KDQoNCiNjb21wYXJlIHJhdGluZ3Mgb2YgZnJlZSB2cyBwYWlkIGZvciBlYWNoLiANCg0KDQojVXNlIE1MIHRvIHRha2UgdGhlIHVzZXJfcmF0aW5nX3ZlciAocmF0aW5nIG9mIGN1cnJlbnQgYXBwIHZlcnNpb24pIGFuZCBwcmVkaWN0IHRoZSB1c2VyX3JhdGluZyAob3ZlcmFsbCBhcHAgdmVyc2lvbikgZm9yIGVhY2ggY2F0ZWdvcnkNCg0KDQoNCmBgYA0KDQoNCg0KDQo=